feat(rocpd): add AI-powered GPU trace analysis module (rocpd analyze)#4030
Open
feat(rocpd): add AI-powered GPU trace analysis module (rocpd analyze)#4030
Conversation
analyze.py:
- Bug: execute() passed CLI key 'format' to analyze_performance() which
expects 'output_format', so --format json/markdown was silently ignored
and text was always written. Fix by mapping the key before the call.
cmake/Modules/rocprofiler-sdk-utilities.cmake:
- rocprofiler_sdk_pc_sampling_disabled and
rocprofiler_sdk_pc_sampling_stochastic_disabled called list(GET ...)
on the result of rocprofiler_sdk_get_gfx_architectures without
guarding against an empty list. On build machines without GPUs
(CI containers, cross-compile hosts) CMake configure failed with
"list GET given empty list". Add length check and early-return with
PC sampling disabled when no GPUs are present.
tests/CMakeLists.txt:
- rocprofiler-sdk-tests-gfx-info was left empty on no-GPU hosts,
causing all sub-CMakeLists that do list(GET rocprofiler-sdk-tests-gfx-info 0 ...)
to fail at configure time. Populate the variable with placeholder
"gfx000" when no hardware is detected; this matches none of the
known GPU patterns so all hardware-dependent tests are correctly
disabled while configure completes without errors.
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Add _format_as_webview() function: self-contained HTML report with AMD dark theme, interactive sortable tables, SVG donut gauges for GPU util and wave occupancy, collapsible recommendation cards with priority color-coding, stacked execution breakdown bar, and copy-to-clipboard profiling commands. No external CDN dependencies. - Wire 'webview' format into format_analysis_output() dispatch - Add 'webview' to --format CLI choices (text/json/markdown/webview) - Fix output file extension: execute() now appends .txt/.json/.md/.html automatically based on the selected format, so output files always have the correct extension Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- README.md: add webview to feature list, CLI examples, data flow diagram, AnalysisResult method list, and a new Example 4 section - AI_ANALYSIS_API.md: add webview to feature list and AnalysisResult methods; document each format's output file extension (.txt/.json/ .md/.html); add full Webview section under Output Formats covering features, CLI usage, and Python API usage - SCHEMA_CHANGELOG.md: add v0.1.1 entry noting webview format addition and auto-extension behavior (no JSON schema changes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add a pure CSS+JS floating tooltip system to the webview HTML report so every visual element explains itself on hover. No external deps. Tooltips added to: - Gauge widgets (GPU Utilization, Wave Occupancy): explain the underlying hardware counter formula (GRBM_GUI_ACTIVE/GRBM_COUNT, SQ_WAVES), target thresholds, and current status - Execution breakdown: stacked bar segments and individual bars for Kernel Execution, Memory Copies, API Overhead, and GPU Idle — each explains what the metric means, good/bad thresholds, and how to fix - Overview stat cards: Primary Bottleneck (per-type explanation of what it means and how to address it), Total Runtime, Kernel Time, Analysis Tier (explains Tier 1 vs Tier 2 and how to upgrade) - Hotspot table column headers: Calls, Total/Avg/Min Time, % Total - Memory transfer table: direction cells (H2D, D2H, D2D, P2P with PCIe/HBM bandwidth context) and all column headers - Hardware counter table rows (via COUNTER_TIPS JS lookup): GRBM_COUNT, GRBM_GUI_ACTIVE, SQ_WAVES, SQ_WAVE_CYCLES, SQ_INSTS_VALU/SALU/VMEM_RD/VMEM_WR/LDS/SMEM, FETCH_SIZE, WRITE_SIZE, TCP/TCC cache counters, TA_TA_BUSY, and more. Unknown counters get a generic fallback message. Implementation details: - #tt floating div follows mouse cursor, repositions at viewport edges - [data-tip] elements use single-quoted HTML attributes; tip content can include <strong>, <em>, <code>, .tok/.twarn colored spans - Counter tips use data-ctr attribute + JS COUNTER_TIPS object lookup to decouple tip content from Python string generation Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- AI_ANALYSIS_API.md: expand Webview features list with full tooltip coverage details — gauges (counter formula, thresholds), breakdown bars, overview stats (per-bottleneck guidance), hotspot columns, memory direction cells, and 20+ AMD GPU hardware counter definitions - README.md: add tooltip note to Example 4 (Interactive HTML Webview) explaining that every visual element is self-documenting on hover - SCHEMA_CHANGELOG.md: add v0.1.2 entry — no schema changes; notes the COUNTER_TIPS JS lookup, tooltip coverage, and fallback behavior for unknown counters Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Overhaul the --format webview HTML report inspired by AMD dashboard design patterns for a cleaner, more scannable interface: - Light/Dark theme toggle with localStorage persistence (defaults dark) - Sticky header with AMD gradient, status summary badges (Critical/ Warning/Low/Info counts from recommendations), and metric pills row (runtime, kernel count, analysis tier, timestamp, DB path) - Status-colored KPI cards in overview: kernel %, bottleneck type, total runtime, and tier each have a colored top border (ok/warn/crit) reflecting health status at a glance - Section card pattern (.scard) with icon+title+badge headers throughout - Priority icons on recommendation cards: 🔴 HIGH 🟠 MEDIUM 🟡 LOW ℹ INFO - Gradient execution breakdown bars and grid-aligned legend rows - FAB scroll-to-top button (appears after 250px scroll) - Staggered @Keyframes fadeInUp entrance animations on section cards - Improved typography (system font stack; works fully offline) - Gauge cards: background fill + hover border effect (Tier 2) - Improved table headers: uppercase + 2px bottom border Also updates SCHEMA_CHANGELOG.md (v0.1.3), README.md, and AI_ANALYSIS_API.md to document all new webview UI features. No changes to JSON output schema or analysis logic. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
CSS `content` property does not process HTML entities. Replace `content:'→'` with `content:'→'` (U+2192) in the .findings li::before rule so the right-arrow bullet renders correctly instead of displaying as literal text '→'. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Documents the root cause and fix for the key findings bullet icons rendering as literal HTML entity text in the webview report. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
The #tt floating tooltip used color:var(--text) which in light mode resolves to ~#181828 (near-black) — invisible against the always-dark #0e0e1c tooltip background. Replace with a fixed light color (#dde0f2) so the tooltip remains readable regardless of the active theme. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Documents the root cause and fix for tooltip text being invisible in light theme (color:var(--text) resolving to near-black against an always-dark tooltip background). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
The recommendation engine was suggesting rocprofv3 flags (e.g. --hip-api-trace, --hsa-trace) that were already covered by the user's original --sys-trace run, creating confusing advice. Fix: inspect the database before generating recommendations to infer which collection flags were already used: - kernels rows → --kernel-trace covered - regions rows → --hip-trace / --hsa-trace covered (API spans) - memory_copies rows → --memory-copy-trace covered - kernels + regions → full --sys-trace implied (subsumes all trace flags) Redundant flags are stripped from recommended rocprofv3 commands. Commands whose stripped flags leave nothing new to collect are dropped entirely. rocprof-sys and rocprof-compute commands are always preserved (different tool, always a new perspective). New helpers: _detect_already_collected(), _filter_rec_commands(), _SYS_TRACE_IMPLIED constant. generate_recommendations() gains an already_collected parameter; analyze_performance() calls the detector and threads the result through. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…esent rocprof-sys --trace collects the same HIP/HSA API call data as rocprofv3 --sys-trace (just in Perfetto format instead of rocpd). Treat it as equivalent and drop it when sys-trace data is already in the database. Rules in _filter_rec_commands() are now per-tool: - rocprofv3: strip covered flags; drop if nothing meaningful remains - rocprof-sys: drop if only --trace (≡ sys-trace); keep when it carries extra flags like --trace-gpu-memory that rocprofv3 can't - rocprof-compute: always keep (deep hardware counter analysis) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
The LLM was recommending flags already covered by the user's original --sys-trace run (e.g. --hip-api-trace, --hsa-trace, rocprof-sys --trace). Add a new "Context-Aware Profiling Recommendations" section to the LLM reference guide (the "fence") that explicitly instructs the model to: 1. Read profiling_info.profiling_mode to identify what was already collected 2. Know that --sys-trace subsumes --hip-trace, --hsa-trace, --hip-api-trace, --kernel-trace, --memory-copy-trace, --marker-trace, --roctx-trace 3. Know that rocprof-sys --trace is equivalent to --sys-trace (same API data, different format) and must not be recommended when sys-trace exists 4. Only recommend the INCREMENTAL next step (--pmc, rocprof-compute, etc.) 5. State "no additional run needed" when all required data is present Also add an explicit prohibition in the "What NOT to Do" section. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Instead of enumerating every flag equivalence (--sys-trace subsumes --hip-trace, --hsa-trace, etc.), instruct the LLM to reason from the tool documentation already present in the guide to determine flag overlap and tool equivalence itself. The "Context-Aware Profiling Recommendations" section is now concise: tell the model what to do (read profiling_mode, use the docs to reason about equivalence, recommend only the incremental next step) without hardcoding every combination that should be in the model's reasoning. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Suppresses .claude/, __pycache__/, *.pyc, and rocpd-output-data/ from appearing as untracked files in git status. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Fixes all 13 issues from the deep-research-report audit:
Critical:
- AIA-001: fix analyze_database() — call individual analysis functions
(compute_time_breakdown, identify_hotspots, analyze_memory_copies,
analyze_hardware_counters, generate_recommendations) instead of the
broken analyze_performance() wrapper that returns str not dict
High:
- AIA-002: fix _build_analysis_result() key mapping (issue/suggestion/
estimated_impact/actions, uppercase priority comparison)
- AIA-003: add WEBVIEW to OutputFormat enum
- AIA-004: fix to_json() to return schema-conformant output via
format_analysis_output(); add to_webview() method; store raw payloads
as result._raw for schema-conformant serialization
- AIA-012: create ai_analysis/tests/test_api_standalone.py (23 tests)
and tests/rocprofv3/rocpd/test_ai_analysis_standalone.py; update docs
Medium:
- AIA-005: re-raise LLMAuthenticationError/LLMRateLimitError instead of
silently downgrading to warnings
- AIA-006: fix _convert_result_to_llm_format() to use real hotspot/
memory/counter data from result._raw instead of empty placeholders
- AIA-007: implement file path redaction in _sanitize_data() using regex
- AIA-008: ReferenceGuideNotFoundError now lists all attempted paths;
get_reference_guide_path() collects all paths before raising
- AIA-009: add DEFAULT_ANTHROPIC_MODEL/DEFAULT_OPENAI_MODEL constants;
model names configurable via ROCPD_LLM_MODEL env var and new
--llm-model CLI flag
- AIA-013: fix validate_database() to query type IN ('table','view')
Low:
- AIA-010: fix Optional type hints in exceptions.py
- AIA-011: export ReferenceGuideNotFoundError from __init__.py
Additional:
- Add --llm-model CLI flag to rocpd analyze (passes model to LLMAnalyzer
via ROCPD_LLM_MODEL env var with proper save/restore)
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
sanitize_input_list() iterates over its argument, so passing a plain str causes it to iterate over individual characters (e.g. 'p', 'r', 'o', ...). Wrap the single path string in a list in both analyze_database() and validate_database() so the path is treated as one item. Fixes: analyze_database() returning 0 kernels when called via the Python API even though the CLI works correctly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Add if __name__ == '__main__' entry point to test_ai_analysis_standalone.py so it can be invoked directly by Python (required for CTest integration) - Add configure_file() to copy test file to build directory at cmake time - Add rocprofiler_add_integration_execute_test() registering rocprofv3-test-rocpd-ai-analysis-unit-tests (test #597) with labels integration-tests;rocpd;pytest and 120s timeout - 23 tests pass via: ctest -R rocprofv3-test-rocpd-ai-analysis-unit-tests -V Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…fy output format - llm_analyzer.py: try max_completion_tokens first (required by gpt-5, o1, o3, and newer gpt-4o variants); fall back to legacy max_tokens transparently if the model reports max_completion_tokens as unsupported (old models) - analyze.py: print a format hint when output defaults to text (.txt), so users know to add --format webview / --format json / --format markdown Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ters The recommendation engine was suggesting commands like: rocprofv3 --pmc GRBM_COUNT GRBM_GUI_ACTIVE SQ_WAVES ... even when those exact counters were already present in pmc_events. Root cause: _detect_already_collected() tracked trace flags (--sys-trace, --kernel-trace, etc.) but never inspected pmc_events for counter names. _filter_rec_commands() only checked command flags, not --pmc arg values. Fixes: - _detect_already_collected(): query pmc_events for DISTINCT counter_name; add "pmc:<NAME>" entries to the covered frozenset for each counter found - _filter_rec_commands(): for rocprofv3 commands, strip already-collected counters from the --pmc arg value; drop --pmc entirely if all counters are covered; treat --kernel-names as a scope filter (not data collection) so a command reduced to only scope+output args is dropped cleanly; append note listing removed counters to recommendation description - Add 7 unit tests covering full/partial/zero PMC stripping, full_command update, description note, kernel-names-only drop, and rocprof-compute always-kept behavior Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
… hint SCHEMA_CHANGELOG.md — add v0.1.8 entry covering: - PMC counter deduplication: _detect_already_collected() now inspects pmc_events; _filter_rec_commands() strips already-collected counters from --pmc args and drops fully-redundant commands - OpenAI max_completion_tokens compatibility for gpt-5/o1/o3 - Output format hint when text is the default - CTest registration of 23 AI analysis API unit tests AI_ANALYSIS_API.md: - Add "Recommendation Deduplication" section explaining the PMC and trace-flag deduplication table and behavior - Note OpenAI model compatibility (max_completion_tokens auto-fallback) CLAUDE.md: - Bump schema version reference: v0.1.1 → v0.1.8 - Update test count: 69 → 76 (7 new PMC filter tests) - Add PMC deduplication and OpenAI compat notes to Python API section Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…nter knowledge Incorporate knowledge from four AMD ROCm profiling blog articles to improve LLM-guided analysis quality and progressive recommendation accuracy. Key additions: - Recommended AMD 3-step profiling workflow: rocprof-sys (system timeline) → rocprofv3 (hardware counters on hot kernels) → rocprof-compute (deep analysis); guide LLM to recommend only the incremental next step - Amdahl's Law as the core prioritization principle (focus on kernels >10% of total time only) - VGPR→Occupancy table for all CDNA architectures (32/64/96/128/168/256 VGPRs mapped to occupancy %) - Hardware Counter Reference table with 10+ counters and derived metric formulas (GPU utilization, BW, L2 hit rate, VALU util, LDS util) - Bandwidth formula: (FETCH_SIZE + WRITE_SIZE) * 64 bytes / duration_ns - Memory Hierarchy section: VGPR→LDS→L1→L2→HBM with per-GPU cache sizes and hit-rate thresholds that indicate problems - LDS bank conflicts: 32 banks, detection and avoidance patterns - API/Launch Overhead as a new explicit bottleneck type - ILP and HIP Streams as new optimization techniques - Multi-GPU/MPI profiling guidance in the rocprof-sys section - Ridge points per GPU: MI300X ~31, MI250X ~15, MI100 ~19 FLOP/Byte - Confidence level examples with concrete counter-based phrasing - Expanded GPU specs: SIMDs per CU (4), max waves per SIMD (8), L1 sizes Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Implements interactive.py with SessionData, PersistentMenuItem, HistoryEntry dataclasses and SessionStore (save/load/find_by_source_dir) for --interactive session file I/O under ~/.rocpd/sessions/. All 5 unit tests pass. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Wrap load() body in try/except; on failure emit warnings.warn and return None - Replace lambda sort key in find_by_source_dir with _safe_dt() using datetime.fromisoformat + fallback to datetime.min - Remove redundant 'import dataclasses' inside to_dict() (already at module level) - Widen SessionStore.__init__ type hint to Union[str, pathlib.Path]; add Union to imports - Add 5 new tests: malformed JSON skipped, make_session_id slug/spaces/fallback, newest-first ordering Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
… resume prompt Implements Task 2 of the interactive session feature: - Add rendering helpers (_print, _input, _PRI_STYLE) with optional rich console support - Add InteractiveSession class with main event loop, session init/resume logic, and save-on-quit - Add _prompt_resume() for auto-detecting and offering to resume prior sessions - Add _render_main_menu() showing persistent menu items from previous analyses - Add stubs for _path_profiling(), _path_optimize(), _pursue_recommendation() - Add TestInteractiveSessionMenu with 3 tests (new session, quit-saves, resume-loads) - All 13 tests pass (10 existing TestSessionStore + 3 new TestInteractiveSessionMenu) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…remove duplicate import - Wrap `_input()` in `run()` with try/except EOFError to call save-and-quit gracefully - Print feedback message in `_prompt_resume()` when selection is out of range or unrecognized - Remove duplicate `from rich.panel import Panel` inside `_render_main_menu()` (module-level import already covers it) - Add 4 new tests: [s] save without quit, EOF exits cleanly, numeric entry pursues recommendation, invalid resume choice starts new session (17 tests total, all passing) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…k 3) Replace the _path_profiling stub with a full implementation that displays profiling commands from tier0 and existing recommendations, optionally annotates them via LLM (metadata only, no source text), prompts for a .db file path, runs Tier 1/2 analysis, and promotes resulting recommendations to the persistent menu. Add _collect_profiling_commands, _llm_annotate_profiling_plan, and _run_tier1_analysis helpers. Add TestPathProfiling with 2 tests; all 19 tests pass. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add _update_checkpoint_with_run() to WorkflowSession that finds the most recent CheckpointRecord without a run attached, sets its run_index to the latest trace_history index, and computes performance_delta_pct from total_runtime_ns when two or more analysis snapshots are available. Hook the method into _phase3_run_profiler after both successful trace-run save sites (trace-files-found path and manual-DB-entry path). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…parate methods _update_checkpoint_with_run() was computing performance_delta_pct from analysis_history before Phase 4 had appended the current run's analysis, causing delta to always read stale data. Refactor: Phase 3 only sets run_index via the existing method; new _update_checkpoint_delta() is called from Phase 4 after _record_analysis() so analysis_history[-1] is always the current run. Add test_update_checkpoint_delta_noop_when_insufficient_history. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add _rollback_to_checkpoint, _blacklist_checkpoint, and _build_blacklist_block to WorkflowSession, plus _restore_from_snapshots helper. Rollback uses git fast path when commit is reachable, falls back to file_snapshots otherwise. 9 new tests added; all 45 workflow tests pass. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ves session - Remove early `return` in `_rollback_to_checkpoint` when target_cp_id==-1 and git is unavailable: execution now falls through to the cleanup section so checkpoints, trace_history, analysis_history, and iteration_count are always cleared even when file restore is impossible. - Add `self._save_session()` at the end of `_blacklist_checkpoint` so the blacklisted flag is persisted to disk immediately after mutation. - Add test `test_rollback_baseline_no_git_still_clears_state` to verify the baseline-no-git path clears all state (46 tests total, all passing). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add _show_checkpoint_picker() to WorkflowSession displaying a checkpoint table with performance deltas and prompting for optional blacklisting of regression checkpoints before restoring. Wire [b] into _phase5_rec_menu across all three menu paths (already_reprofiled, all_info, HIGH/MEDIUM). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Validate cp_id against actual cp_id set (not list length) to handle non-contiguous ids - Show blacklist prompt for baseline rollback (not just partial rollbacks) - Replace raw input() calls with _input() wrapper for EOFError safety - Strengthen test assertion to verify _blacklist_checkpoint called with correct cp_id Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
When _build_blacklist_block() returns a non-empty string, prepend it to the suggestions passed to _llm_rewrite_file so the LLM avoids previously failed approaches when rewriting source files. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add _teardown_checkpoints() to remove all checkpoint worktrees when a WorkflowSession exits (refs are preserved for GC protection). Add _prune_stale_worktrees() to clean up orphaned worktrees from crashed sessions at startup. Both are hooked into run(): pruning after _init_checkpoints(), teardown in the finally block. Current session worktrees are never pruned. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…e path Wrap remove_worktree calls in _teardown_checkpoints with try/except so any exception (e.g. FileNotFoundError when git is missing) cannot propagate out of the finally block and suppress _save_session. Also add an early-return guard in GitCheckpointManager.remove_worktree for empty worktree_path strings, preventing a spurious git error. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- CheckpointRecord dataclass with file_snapshots for offline restore - GitCheckpointManager: git commit + update-ref + worktree add --detach per edit - WorkflowState: repo_root, baseline_commit, checkpoints, active_checkpoint - Phase 6: creates checkpoint after each AI edit batch - Phase 3: records run_index and performance_delta_pct per checkpoint - Phase 5: [b] rollback menu with checkpoint picker and blacklist prompt - Blacklist: uses edit_summary directly; deduplicates; injects into Phase 6 LLM prompt - Session exit: removes worktrees (refs stay for GC protection) - Session start: dirty-tree abort; stale worktree pruning - Fix: remove spurious _conv attribute from WorkflowSession (test_workflow_session_has_no_conv) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Fix blacklist lost after rollback: persist blacklisted_approaches on WorkflowState - Fix suggestions accumulating blacklist prefix on each retry: use effective_suggestions - Fix cp_id lookup: use search-by-id instead of list index in rollback and blacklist - Fix _gcm left set after dirty-tree abort in _init_checkpoints - Fix pathlib.Path.exists mock in test to use return_value=False Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ck formatting - Remove is_dirty() from GitCheckpointManager — dirty working tree is not an obstacle because commit_files uses git add -- <specific_file> which only stages the exact files modified by each AI edit, leaving other in-progress changes untouched - Remove the dirty-tree guard from _init_checkpoints() so sessions continue normally even when the repo has uncommitted changes - Fix flake8 F841 in remove_worktree: drop unused result = assignment - Apply black formatting to interactive.py and test_workflow.py - Update tests: replace test_session_start_aborts_when_dirty with test_checkpoints_work_with_dirty_tree confirming checkpoints initialise successfully despite a dirty tree; remove two now-deleted is_dirty tests Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a new rocpd analyze module to generate offline, human-readable GPU trace insights (with optional LLM enhancement), plus supporting packaging/build integration and a substantial unit/integration test suite.
Changes:
- Adds AI analysis Python package (
rocpd.ai_analysis), including persistent LLM conversation support and TraceLens-derived analysis utilities. - Integrates the new
analyzesubcommand into therocpdCLI and CMake test/packaging flows. - Introduces extensive standalone/unit/integration tests for schema conformance, guide filtering, interactive workflow/checkpoints, and TraceLens port logic.
Reviewed changes
Copilot reviewed 29 out of 36 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| projects/rocprofiler-sdk/tests/rocprofv3/rocpd/test_guide_filter_standalone.py | Adds standalone tests for guide section tag selection and filtering logic. |
| projects/rocprofiler-sdk/tests/rocprofv3/rocpd/test_analyze_schema.py | Adds schema structure + conformance tests (incl. Tier 0 and combined outputs) with Py3.6 shim. |
| projects/rocprofiler-sdk/tests/rocprofv3/rocpd/test_ai_analysis_standalone.py | Adds standalone API + regression tests for AI analysis behaviors and security/correctness fixes. |
| projects/rocprofiler-sdk/tests/rocprofv3/rocpd/CMakeLists.txt | Wires rocpd analyze into integration tests and runs standalone pytest-based test scripts. |
| projects/rocprofiler-sdk/tests/pytest-packages/pytest_utils/perfetto_reader.py | Minor SQL formatting cleanup in trace reader query. |
| projects/rocprofiler-sdk/source/scripts/format-deps.py | Removes unused import and reformats argparse definition. |
| projects/rocprofiler-sdk/source/lib/python/utilities.cmake | Installs analyze.py, tracelens_port.py, and copies ai_analysis runtime assets (excluding tests). |
| projects/rocprofiler-sdk/source/lib/python/rocpd/tracelens_port.py | Adds TraceLens-derived interval/categorization/short-kernel analysis utilities. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/tests/test_workflow.py | Adds mock-based tests for workflow session phases and git checkpoint manager behavior. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/tests/test_tracelens_port.py | Adds unit + optional integration tests for tracelens_port functions. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/tests/test_local_llm.py | Adds tests for local OpenAI-compatible endpoint provider behavior. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/tests/test_llm_conversation.py | Adds tests for streaming, compaction, persistence, and interactive integration for LLMConversation. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/tests/test_interactive.py | Adds tests for session storage/menu behavior and profiling/optimize flows. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/tests/test_api_standalone.py | Adds standalone tests for public API, exceptions, serialization, and recommendation bucketing. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/tests/init.py | Marks ai_analysis tests package. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/share/amd_rocm_logo.png | Adds branding asset used by interactive UI. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/llm_conversation.py | Introduces persistent multi-turn LLM session with streaming + compaction + archive. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/exceptions.py | Adds typed exception hierarchy for AI analysis module. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/docs/LLM_GUIDE_SECTIONS.md | Documents context-tagged guide section filtering system. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/init.py | Exposes public AI analysis API surface + lazy interactive imports. |
| projects/rocprofiler-sdk/source/lib/python/rocpd/main.py | Adds rocpd analyze CLI subcommand and argument validation. |
| projects/rocprofiler-sdk/source/bin/rocprofv3.py | Formatting-only change to env var update call. |
| projects/rocprofiler-sdk/cmake/Modules/rocprofiler-sdk-utilities.cmake | Avoids list(GET ...) errors when no GPUs are detected at configure time. |
| .gitignore | Ignores Claude session data, Python bytecode, and generated analysis output directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
projects/rocprofiler-sdk/source/lib/python/rocpd/ai_analysis/llm_conversation.py
Show resolved
Hide resolved
…compat in CMake schema test - llm_conversation.py: after parsing ROCPD_LLM_PRIVATE_HEADERS, validate the result is a dict and raise a clear ValueError if it is not (e.g. if the env var was set to a JSON array or string instead of an object) - tests/rocprofv3/rocpd/CMakeLists.txt: replace importlib.resources.files() with pkgutil.get_data() in the inline schema-validate test so it works on Python 3.6 where importlib.resources.files() is not available; also replace f-strings with str concatenation for broad Python compatibility Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…own tag - test_ai_analysis_standalone.py: test_kernel_name_shell_quoted_in_full_command was filtering for rocprofv3 commands but the kernel name only appears in the rocprof-compute command (rocprofv3 collects general PMC counters without kernel-name scoping). Switch filter to rocprof-compute where shlex.quote() is correctly applied. - test_guide_filter_standalone.py: add tracelens_metrics to KNOWN_TAGS — this tag is used in llm_analyzer.py (_select_tags adds it when TraceLens data is present) and tagged in llm-reference-guide.md, but was missing from the vocabulary guard set causing test_all_tags_are_from_known_vocabulary to fail. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…or test scripts configure_file(COPYONLY) runs only at cmake configure time, leaving stale copies in the build directory when test files are edited during development. Introduce rocpd_stage_test_script() helper function that uses: add_custom_command(OUTPUT ... DEPENDS <src>) + add_custom_target(ALL ...) This means cmake --build re-copies any test file whose source has changed, without requiring the developer to re-run cmake configure. Also adds set_property(CMAKE_CONFIGURE_DEPENDS) so cmake does re-configure automatically when a CI system or fresh checkout triggers it. Replace all configure_file COPYONLY calls for Python test scripts (both the tests/rocprofv3/rocpd/ originals and the ai_analysis/tests/ sub-package copies) with rocpd_stage_test_script(). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
AMD ROCm users need actionable guidance from their GPU profiling data, but interpreting raw
rocprofv3output requires deep GPU architecture knowledge. This PR introduces an AI-powered analysis module (rocpd analyze) that reads.rpdtrace databases and produces human-readable performance insights, bottleneck detection, and optimization recommendations — without requiring an internet connection or LLM API key.The module follows a tiered progressive analysis strategy (Tier 0–4), giving users immediately useful output from any profiling run while allowing deeper analysis as more data is collected.
Technical Details
New subcommand:
rocpd analyze [-i trace.db] [--source-dir ./src] [--interactive "<app>"]Core analysis (
analyze.py,ai_analysis/)--source-dir): Scans.hip/.cpp/.cu/.pyfiles for GPU programming patterns (kernels, memcpy, sync, ROCTx, frameworks). Produces a profiling plan with a suggested firstrocprofv3command and recommended PMC counters. Works without a.dbfile.pmc_eventstable is present.text,json(schema v0.1.x/v0.2.0),markdown,webview(self-contained AMD-themed HTML with SVG gauges, collapsible recommendation cards, sortable hotspot table, hover tooltips, light/dark toggle).--llm anthropic|openai|private): Optional natural-language explanations via Anthropic Claude, OpenAI, or any OpenAI-compatible private/enterprise endpoint. Kernel names and paths are sanitized before transmission. Falls back gracefully when unavailable.--prompt): Target the analysis at a specific question (e.g.--prompt "Why is my matmul kernel slow?")._split_pmc_into_passes()automatically separates TCC-derived counters (FETCH_SIZE,WRITE_SIZE) into dedicated passes to avoid hardware block limit errors (rocprofv3 error code 38).Interactive workflow (
ai_analysis/interactive.py)Two session classes:
InteractiveSession— menu-driven[p]/[a]/[o]/[s]/[q]loop launched after standard analysis:LLMConversationshared across all[a]/[o]calls in a session; history survives--resume-sessionLLMConversationauto-compacts every N turns (configurable via--llm-compact-every) using an LLM-generated summary to stay within context limitsrocprofv3commands extracted from LLM responses and offered as a numbered run menu~/.rocpd/sessions/on[s],[q], and Ctrl+CWorkflowSession— 7-phase automated profiling + optimization loop triggered by--interactive "<app>":rocprofv3flags--process-syncand-o results_%nid%so each process writes its own DB. After profiling, per-process databases are merged viarocpd.merge.merge_sqlite_dbs()before analysis..bakbackup created before any edit[v]/rreverts the last edit, prompts for error context, calls LLM to analyze the failure and propose an alternative, then shows a what-next menu ([f]retry fix /[p]re-profile /[q]exit)[r] re-profile → same INFO → [r]loops by fingerprinting collected counters and flags across all prior runs[b]rollback menu in Phase 5 lets users revert to any prior state~/.rocpd/sessions/workflow_<ts>_<slug>.jsonWorkflowSession — Session Checkpoints
Each AI source-file edit creates a git-worktree checkpoint so the user can roll back to any prior state and blacklist approaches that caused regressions.
[b]rollback menu in Phase 5: shows checkpoint table with performance deltas; regression checkpoints flagged; user prompted to blacklist before rollbackWorkflowState.blacklisted_approaches(never truncated by rollback)checkout <hash> -- <file>(fast path) or file-snapshot write (fallback when git unavailable)commit_filesstages only the AI-modified files (git add -- <file>), so in-progress user changes are never touched or included in checkpoint commits_init_checkpoints+_prune_stale_worktreesat start;_teardown_checkpointsinfinally(removes worktrees; refs kept for GC protection)LLM conversation (
ai_analysis/llm_conversation.py)New
LLMConversationclass replacing the previousSessionContextdict approach:list.append+"".join()(O(n)) instead of string concatenation (O(n²)) to avoid quadratic allocation on long responseskeep_recent_turnsverbatim, summarizes older turns with a non-streaming LLM call~/.rocpd/sessions/<id>_history.jsonlto_dict()/from_dict()for full session persistence and resumeLLM hardening
ROCPD_LLM_PRIVATE_HEADERSdict validation: Afterjson.loads()the result is validated to be adict; a non-dict JSON value (e.g. an array) raises aValueErrorwith a clear message showing the expected format, rather than an opaqueTypeErrorfromheaders.update()Build & packaging (
utilities.cmake)file(COPY ... DESTINATION ...)replacesconfigure_file ... COPYONLYfor AI analysis assets — fixes EPERM on binary files (e.g. PNG) during CMake configure*.pngadded torocpd_AI_ANALYSIS_FILESglob soai_analysis/share/amd_rocm_logo.png(used by the interactive session banner) is installed alongside.py/.md/.jsonfilestracelens_port.pyadded torocpd_PYTHON_SOURCESlist(GET ...)calls inrocprofiler-sdk-utilities.cmakewith an early-return whenrocminforeturns an empty GPU list. Note: GPU is required to run the integration tests; builds on GPU-less machines configure cleanly but the test suite is not expected to pass without hardware.Python 3.6 compatibility (RHEL 8.8 / SLES 15.6)
tracelens_port.py: Changed_CATEGORY_PATTERNS: List[Tuple[str, re.Pattern]]annotation toList[Tuple[str, Any]].re.Patternwas introduced in Python 3.7; Python 3.6 evaluates module-level annotations eagerly, causing anAttributeErrorat import time that cascaded into all tests importinganalyze.pyorllm_analyzer.py.test_analyze_schema.py: Addedtry/except ImportErrorshim forimportlib.resources(Python 3.7+), falling back topkgutil.get_data()on Python 3.6.Schema file corrections
The
analysis-output.schema.jsonfile was corrected to match the already-documented v0.2.0 specification. The emitted JSON format was never wrong; only the validator was:profiling_modeenum missing"source_only"analysis_tierminimum was10execution_breakdowntype"object"only["object", "null"]tier0property undeclared$idembedded version string"rocpd-ai-analysis-output"Tier 0 source-only JSON output (
schema_version: "0.2.0") now passesjsonschema.validate().Tests
tests/rocprofv3/rocpd/test_analyze.py— 76 unit tests covering all recommendation rules, helper functions, PMC filter, and output formatterstests/rocprofv3/rocpd/test_analyze_schema.py— 28 JSON schema conformance tests (v0.1.x, v0.2.0 source-only, and combined Tier 0+Tier 1/2; was 17)tests/rocprofv3/rocpd/test_ai_analysis_standalone.py— 23 Python API unit tests (analyze_database,analyze_source,AnalysisResult)tests/rocprofv3/rocpd/test_guide_filter_standalone.py— LLM reference guide section filter testsai_analysis/tests/test_interactive.py— 22 interactive session unit testsai_analysis/tests/test_llm_conversation.py—LLMConversationstreaming/compaction/persistence testsai_analysis/tests/test_workflow.py— 52WorkflowSessionphase tests including full checkpoint system coverage (CheckpointRecord, GitCheckpointManager, rollback, blacklist, teardown, stale pruning)JIRA ID
N/A
Test Plan
pytest --noconftestfrom the build output directoryctest -R rocpd-analyzeafter a full build (requires AMD GPU)merged_db.db(2000 kernel dispatches + 64000 PMC samples) for Tier 1/2 analysis and all four output formatsTest Result
test_analyze.pyunit tests passtest_workflow.pytests pass (checkpoint system coverage)test_interactive.pytests pass (no regressions)test_llm_conversation.pytests pass (no regressions)jsonschema.validate()passes for Tier 0, Tier 1/2, and combined JSON outputSubmission Checklist